Practical guide for checking assumptions and handling violations in South Asian development research
Common non-linear relationships: Income vs health outcomes (diminishing returns at high income), Education vs fertility (steep decline then plateau), Distance vs service use (threshold effects)
Income data: Often highly skewed (few very wealthy households). Agricultural yields: May have different variance across farm sizes. Education scores: Floor/ceiling effects common.
Village clustering: People in same village share infrastructure, weather, policies. Household clustering: Family members share economic conditions. Regional clustering: States/districts have different governance.
Income data: Control groups often have less variance than treatment groups. Test scores: Rural vs urban schools may have very different variance. Agricultural yields: Irrigated vs rain-fed areas show different variability.
Education & Income: Highly correlated. Infrastructure variables: Water, electricity, roads often bundled. Health indicators: Multiple nutrition measures. Geographic variables: Rainfall, temperature, elevation may be collinear.
Success stories: Exceptionally successful interventions. Extreme poverty: Households with very low income/assets. Urban-rural differences: Urban areas in rural samples. Data errors: Recording mistakes, unit confusion.
| Assumption | Method | Severity if Violated | Primary Consequence |
|---|---|---|---|
| Independence | All methods | CRITICAL | Invalid p-values, wrong conclusions |
| Linearity | Correlation, Regression | HIGH | Missed relationships, poor predictions |
| Normality | All methods | MEDIUM (with large n) | Slightly inaccurate p-values |
| Homoscedasticity | ANOVA, Regression | MEDIUM | Inefficient estimates, wrong SE |
| Multicollinearity | Regression | MEDIUM | Unstable coefficients, interpretation issues |
| Challenge | Description | Statistical Impact | Recommended Solution |
|---|---|---|---|
| Seasonal effects | Agricultural data varies by monsoon | Non-independence, heteroscedasticity | Include season controls, cluster by year |
| Village clustering | Households in same village are similar | Independence violation | Cluster-robust SE, multilevel models |
| Extreme inequality | Very skewed income distributions | Non-normality, outliers | Log transformation, robust methods |
| Missing data patterns | Non-random missingness | Selection bias | Multiple imputation, selection models |
| Floor/ceiling effects | Many zero values or maximum scores | Non-normality, non-linearity | Tobit models, transformations |
The goal is not to meet every assumption perfectly, but to:
This handout is part of the ImpactMojo 101 Knowledge Series
Licensed under CC BY-NC-SA 4.0 • Free to use with attribution • www.impactmojo.in